{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1f3a5ebf",
   "metadata": {},
   "source": [
    "# Airbyte JSON"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
   "metadata": {},
   "source": [
    ">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fe72234-3110-4c07-a766-3dc505dd25cc",
   "metadata": {},
   "source": [
    "This covers how to load any source from Airbyte into a local JSON file that can be read in as a document\n",
    "\n",
    "Prereqs:\n",
    "Have docker desktop installed\n",
    "\n",
    "Steps:\n",
    "\n",
    "1) Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`\n",
    "\n",
    "2) Switch into Airbyte directory - `cd airbyte`\n",
    "\n",
    "3) Start Airbyte - `docker compose up`\n",
    "\n",
    "4) In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.\n",
    "\n",
    "5) Setup any source you wish.\n",
    "\n",
    "6) Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up manual sync.\n",
    "\n",
    "7) Run the connection.\n",
    "\n",
    "7) To see what files are create, you can navigate to: `file:///tmp/airbyte_local`\n",
    "\n",
    "8) Find your data and copy path. That path should be saved in the file variable below. It should start with `/tmp/airbyte_local`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "180c8b74",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import AirbyteJSONLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4af10665",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_airbyte_raw_pokemon.jsonl\n"
     ]
    }
   ],
   "source": [
    "!ls /tmp/airbyte_local/json_data/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "721d9316",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = AirbyteJSONLoader(\"/tmp/airbyte_local/json_data/_airbyte_raw_pokemon.jsonl\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9858b946",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "fca024cb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "abilities: \n",
      "ability: \n",
      "name: blaze\n",
      "url: https://pokeapi.co/api/v2/ability/66/\n",
      "\n",
      "is_hidden: False\n",
      "slot: 1\n",
      "\n",
      "\n",
      "ability: \n",
      "name: solar-power\n",
      "url: https://pokeapi.co/api/v2/ability/94/\n",
      "\n",
      "is_hidden: True\n",
      "slot: 3\n",
      "\n",
      "base_experience: 267\n",
      "forms: \n",
      "name: charizard\n",
      "url: https://pokeapi.co/api/v2/pokemon-form/6/\n",
      "\n",
      "game_indices: \n",
      "game_index: 180\n",
      "version: \n",
      "name: red\n",
      "url: https://pokeapi.co/api/v2/version/1/\n",
      "\n",
      "\n",
      "\n",
      "game_index: 180\n",
      "version: \n",
      "name: blue\n",
      "url: https://pokeapi.co/api/v2/version/2/\n",
      "\n",
      "\n",
      "\n",
      "game_index: 180\n",
      "version: \n",
      "n\n"
     ]
    }
   ],
   "source": [
    "print(data[0].page_content[:500])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fa002a5",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
